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Abstract 

Many NP-complete constraint satisfaction problems appear to undergo a "phase transition" 
from solubility to insolubility when the constraint density passes through a critical threshold. In 
all such cases it is easy to derive upper bounds on the location of the threshold by showing that 
above a certain density the first moment (expectation) of the number of solutions tends to zero. 
We show that in the case of certain symmetric constraints, considering the second moment of 
the number of solutions yields nearly matching lower bounds for the location of the threshold. 
Specifically, we prove that the threshold for both random hypcrgraph 2-colorability (Property 
B) and random Not- All-Equal fc-SAT is 2 fe_1 In 2 - 0(1). As a corollary, we establish that the 
threshold for random /c-SAT is of order 0(2 fe ), resolving a long-standing open problem. 

1 Introduction 

In the early 1900s, Bernstein j2] asked the following question: given a collection of subsets of a set 
V, is there a partition of V into V\, V2 such that no subset is contained in either V\ or V<{1 If we think 
of the elements of V as vertices and of each subset as a hyperedge, the question can be rephrased as 
whether a given hypergraph can be 2-colored so that no hyperedge is monochromatic. Of particular 
interest is the setting where all hyperedges contain k vertices, i.e., k- uniform hypergraphs. This 
question was popularized by Erdos - who dubbed it "Property B" in honor of Bernstein - and has 
motivated some of the deepest advances in probabilistic combinatorics. Indeed, determining the 
smallest number of hyperedges in a non-2-colorable fc-uniform hypergraph remains one of the most 
important problems in extremal graph theory, perhaps second only to the Ramsey problem ^21 ■ 

A more modern problem, with a somewhat similar flavor, is Boolean Satisfiability: given a CNF 
formula F, is it possible to assign truth values to the variables of F so that it evaluates to true? 
Satisfiability has been the central problem of computational complexity since 1971 when Cook |21j 
proved that it is complete for the class NP. The case where all clauses have the same size k is known 
as &-SAT and is NP-complete for all k > 3. 

For both fc-SAT and Property B it is common to generate random instances by selecting a 
corresponding structure at random. Indeed, random formulas and random hypergraphs have been 
studied extensively in probabilistic combinatorics in the last three decades. While there are a 
number of slightly different models for generating such structures "uniformly at random", we will 
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see that results transfer readily between them. For the sake of concreteness, let Fk(n,m) denote a 
formula chosen uniformly among all r W A;-CNF formulas on n variables with m clauses. Simi- 
larly, let Hk(n,m) denote a hypergraph chosen uniformly among all ((*') /s-uniform hypergraphs 
with n vertices and m hyperedges. We will say that a sequence of events £ n occurs with high 
probability (w.h.p.) if lin^—joo Pr [£ n ] = 1 and with uniformly positive probability (w.u.p.p.) if 
lim inf 7j._i.oo Pr[£ n ] > 0. Throughout the paper, k will be arbitrarily large but fixed. 

In recent years, both problems have been understood to undergo a "phase transition" as the 
ratio of constraints to variables passes through a critical threshold. That is, for a given number of 
vertices (variables) , the probability that a random instance has a solution drops rapidly from 1 to 
around a critical number of hyperedges (clauses). This sharp threshold phenomenon was discovered 
in the early 1990s, when several researchers |18l I48j performed computational experiments on 
Fz{n,m = rn) and found that while for r < 4.1 almost all formulas are satisfiable, for r > 4.3 
almost all are unsatisfiable. Moreover, as n increases, this transition narrows around r ~ 4.2. 
Along with similar results for other fixed k > 3 this has led to the following popular conjecture: 

Satisfiability Threshold Conjecture: For each k > 3, there exists a constant r^ such that 



lim Pr[i ? fc(n,rn) is satisfiable] 



1 if r < rfc 
if r > rfc . 



In the last ten years, this conjecture has become an active area of interdisciplinary research, 
receiving attention in theoretical computer science, artificial intelligence, combinatorics and, more 
recently, statistical physics. Much of the work on random /c-SAT has focused on proving upper and 
lower bounds for r^, both for the smallest computationally hard case k = 3 and for general k. At 
this point the existence of has not been established for any k > 3. Nevertheless, we will take the 
liberty of writing r& > r* to denote that for all r < r*, Fk(n,rn) is w.h.p. satisfiable; analogously, 
we will write < r* to denote that for all r > r*, Fk(n,rn) is w.h.p. unsatisfiable. 

As we will see, an elementary counting argument yields < 2 fc ln2 for all k. Lower bounds, on 
the other hand, have been exclusively algorithmic: to establish r^ > r* ones shows that for r < r* 
some specific algorithm finds a satisfying assignment with probability that tends to 1. We will see 
that an extremely simple algorithm ^Hj already yields r& = 0,{2 k /k). We will also see that while 
more sophisticated algorithms improve this bound slightly, to date no algorithm is known to find a 
satisfying truth assignment (even w.u.p.p.) when r = uj{k) x 2 k /k, for any u{k) — ► oo. 

The threshold picture for hypergraph 2-colorability is completely analogous: for each k > 3, it 
is conjectured that there exists a constant c& such that 



lim Pr[iifc(n, cn) is 2-colorablc] 



1 if c < Ck 
if c > Ck 



The same counting argument here implies Ck < 2 fc_1 ln2, while another simple algorithm yields 
Ck = Q(2 k /k). Again, no algorithm is known to improve this bound asymptotically, leaving a 
multiplicative gap of order Q(k) between the upper and lower bound for this problem as well. 

In this paper, we use the second moment method to show that random fc-CNF formulas are 
satisfiable, and random A:-uniform hypergraphs are 2-colorable, for density up to 2 k ~ 1 In 2 — O(l). 
Thus, we determine the threshold for random A;-SAT within a factor of two and the threshold for 
Property B within a small additive constant. 

Recall that Fk(n,rn) is w.h.p. unsatisfiable if r > 2 fc ln2. Our first main result is 
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Theorem 1 For all k > 3, Fk(n,m = rn) is w.h.p. satisfiable if r < 2 fc_1 ln2 — 2. 

Our second main result determines the Property B threshold within an additive 1/2 + o(l). 
Theorem 2 For all k > 3, Hk(n,m = cn) is w.h.p. non- 2- colorable if 

c>2 k ~ 1 ln2- — . (1) 

There exists a sequence tk — ► smc/i i/iaf /or all k > 3, Hk(n, m = cn) is w.h.p. 2-colorable if 

h i , In 2 1 + th 
c <2 k - l ln2-- ^± . (2) 

The upper bound in Q corresponds to the density for which the expected number of 2-colorings 
of H).(n,cn) is o(l). Our main contribution is inequality @ which we prove using the second 
moment method. In fact, our approach yields explicit bounds for the hypergraph 2-colorability 
threshold for each value of k (although ones that lack an attractive closed form). We give the first 
few of these bounds in Tabled We see that the gap between our upper and lower bounds converges 
to its limiting value of 1/2 rather rapidly. 



k 


3 


4 


5 


6 


7 


8 


9 


10 


11 


Upper Bound 


2.410 


5.191 


10.741 


21.833 


44.014 


88.376 


177.099 


354.545 


709.436 


Lower Bound 


1.5 


4.083 


9.973 


21.190 


43.432 


87.827 


176.570 


354.027 


708.925 



Table 1: Bounds for the 2-colorability threshold of random k- uniform hypergraphs. 

Unlike the algorithmic lower bounds for random /c-SAT and hypergraph 2-colorability, our 
arguments are non-constructive: we establish that w.h.p. solutions exist for certain densities but 
do not offer any hint on how to find them. We believe that abandoning the algorithmic approach 
for proving such lower bounds is natural and, perhaps, necessary. At a minimum, the algorithmic 
approach is limited to the small set of rather naive algorithms whose analysis is tractable using 
current techniques. Perhaps more gravely, it could be that no polynomial algorithm can overcome 
the Q(2 k /k) barrier. Determining whether this is true even for certain limited classes of algorithms, 
e.g., random walk algorithms, is a very interesting open problem. 

In addition, by not seeking out some specific truth assignment, as algorithms do, the second 
moment method gives some first glimpses of the "geometry" of the set of solutions. Deciphering 
these first glimpses, getting clearer ones, and exploring potential interactions between the geometry 
of the set of solutions and computational hardness are great challenges lying ahead. 

We note that recently, and independently, Frieze and Wormald [33] applied the second moment 
method to random /c-SAT in the case where k is a moderately growing function of n. Specifically, 
they proved that when k — log 2 n — > oo, Ff.(n, m) is w.h.p. satisfiable if m < (1 — e)m* but w.h.p. 
unsatisfiable if m > (1 + e)m*, where m* = (2 fc ln2 — 0(l))n and e = e(n) > is such that 
en — ► oo. Their result follows by a direct application of the second moment method to the number 
of satisfying assignments of Fk(n, m). As we will see shortly, while this approach gives a very sharp 
bound when k — log 2 n — ► oo, it fails for any fixed k and indeed for any k = o(logn). 
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We also note that since this work first appeared [IJ[S], the line of attack we put forward has had 
several other successful applications. Specifically, in the lower bound for the random /c-SAT 
threshold was improved to 2 fc ln2 — O(k) by building on the insights presented here. In [Hj, the 
method was successfully extended to random Max /c-SAT jH|, while in [H] it was applied to random 
graph coloring. We discuss these subsequent developments in the Conclusions. 



1.1 The second moment method and the role of symmetry 

The version of the second moment method we will use is given by Lemma ^ below and follows from 
a direct application of the Cauchy-Schwarz inequality (see e.g., Remark 3.1 in |37j ) . 

Lemma 1 For any non-negative random variable X, 

^ > °i * ■ (3 > 

It is natural to try to apply Lemma^t random /c-SAT by letting X be the number of satisfying 
truth assignments of Fk(n,m). Unfortunately, as we will see, this "naive" application of the second 
moment method fails rather dramatically: for all k > 1 and every r > 0, E[AT 2 ] > (1 + /3) n E[A] 2 
for some /3 = (3(k,r) > 0. As a result, the second moment method only gives an exponentially 
small lower bound on the probability of satisfiability. 

The key step in overcoming this failure lies in realizing that we are free to apply the second 
moment method to any random variable X such that X > implies that the formula is satisfiable. 
In particular, we can let X be the size of any subset of the set of satisfying assignments. By 
choosing this subset carefully, we can hope to significantly reduce the variance of X relative to 
its expectation and use Lemma ^ to prove that the subset is frequently non-empty. Indeed, we 
will establish the satisfiability of random /c-CNF by focusing on those satisfying truth assignments 
whose complement is also satisfying. In Section |3] we will give some intuition for why the number 
of such assignments has much smaller variance than the number of all satisfying assignments. For 
now, we observe that considering only such satisfying assignments is equivalent to interpreting the 
random /c-CNF formula Fk(n,m) as an instance of Not- All-Equal (NAE) /c-SAT, where a truth 
assignment a is a solution if and only if under a every clause contains at least one satisfied literal 
and at least one unsatisfied literal. In other words, our lower bound for the /c-SAT threshold in 
Theorem n is, in fact, a lower bound for the NAE /c-SAT threshold. 

Indeed, for both random NAE /c-SAT and random hypergraph 2-colorability we will apply 
Lemma ^ naively, i.e., by letting X be the number of solutions. This will give Theorem [21 and 
the values in Table Q for hypergraph 2-colorability and, as we will see, exactly the same bounds 
for random NAE /c-SAT. (Indeed, the proof of Theorem [2] is a slight generalization of the proof 
for random NAE /c-SAT.) We will see that this success of the naive second moment is due to 
the symmetry inherent in both problems, i.e., to the fact that the complement of a solution is 
also a solution. Indeed, we feel that highlighting this role of symmetry — and showing how it 
can be exploited even in asymmetric problems like /c-SAT — is our main conceptual contribution. 
Exploiting these ideas in other Constraint Satisfaction problems that have a permutation group 
acting on the variables' domain is an interesting area for further research. 
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1.2 Organization of the paper 

In Section [2 we give some background on random /c-SAT and random hypergraph 2-colorability. In 
Section[3]we explain why the second moment method fails when applied to fc-SAT directly, and give 
some intuition for why counting only the NAE-satisfying assignments rectifies the problem. We also 
point out some connections to methods of statistical physics. In Section0]we lay the groundwork for 
bounding the second moment for both NAE /c-SAT and hypergraph 2-colorability by dealing with 
some probabilistic preliminaries, introducing a "Laplace method" lemma for bounding certain sums, 
and outlining our strategy. The actual bounding occurs in SectionsEJtoEJ Specifically, in Sections^] 
and El we use the Laplace lemma to reduce the second moment calculations for both random NAE 
fc-SAT and random hypergraph 2-colorability to the maximization of a certain function g on the 
unit interval, where g is independent of n. We maximize g in Section [7| and prove the Laplace 
lemma in Section |H1 We conclude in Section El by discussing some recent extensions of this work 
and proposing several open questions. 

2 Related Work 
2.1 Random fc-SAT 

The mathematical investigation of random fe-SAT began with the work of Franco and Paull [3U] 
who, among other results, observed that Fk(n,m = rn) is w.h.p. unsatisfiable if r > 2 fc ln2. To see 
this, let Cfc = 2 fc (^) be the number of all possible /c-clauses and let Sk = (2 fc — l)u) be the number 
of /c-clauses consistent with a given truth assignment. Since any fixed truth assignment is satisfying 
with probability („£)/(^) < (1 — 2~ fc ) m , the expected number of satisfying truth assignments of 
F k (n, m = rn) is at most [2(1 - 2" fc ) r ] n = o(l) for r > 2 k In 2. 

Shortly afterwards, Chao and Franco complemented this result by proving that for all k > 3, 
if r < 2 k /k then the following linear-time algorithm, called Unit Clause (uc), finds a satisfying 
truth assignment w.u.p.p.: if there exist unit clauses, pick one randomly and satisfy it; else pick 
a random unset variable and give it a random value. Note that since UC succeeds only w.u.p.p. 
(rather than w.h.p.) this does not imply a lower bound for r^. 

The satisfiability threshold conjecture gained a great deal of popularity in the early 1990s and 
has received an increasing amount of attention since then. The polynomial-time solvable case 
k = 2 was settled early on: independently, Chvatal and Reed Fernandez de la Vega |2E]> and 
Goerdt [HI] proved that T2 = 1. Chvatal and Reed [Ej) m addition to proving = 1, gave the 
first lower bound for r^, strengthening the positive-probability result of Chao and Franco by 
analyzing the following refinement of UC, called Short Clause (sc): if there exist unit clauses, 
pick one randomly and satisfy it; else if there exist binary clauses, pick one randomly and satisfy 
a random literal in it; else pick a random unset variable and give it a random value. In |19j . the 
authors showed that for all k > 3, SC finds a satisfying truth assignment w.h.p. for r < (3/8)2 k /k 
and raised the question of whether this lower bound for r& can be improved asymptotically. 

A large fraction of the work on the satisfiability threshold conjecture since then has been devoted 
to the first computationally hard case, k = 3, and a long series of results ^3 El OH HI El EH El 

EHl E3 EM ESI EH has narrowed the potential range of r$. Currently this is pinned between 
3.52 by Kaporis, Kirousis and Lalas HOj and Hajiaghayi and Sorkin [35] and 4.506 by Dubois and 
Boufkhad [24] . Upper bounds for r$ come from probabilistic counting arguments, refining the above 
calculation of the expected number of satisfying assignments. Lower bounds, on the other hand, 
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have come from analyzing progressively more sophisticated algorithms. Unfortunately, neither of 
these approaches helps narrow the asymptotic gap between the upper and lower bounds for r k . The 
upper bounds only improve r k < 2 fc ln2 by a small additive constant; the best algorithmic lower 
bound, due to Frieze and Suen [22], is r k > a k 2 k /k where lim^oo a k = 1.817... 

There are two more results that stand out in the study of random fc-CNF formulas. In a 
breakthrough paper, Friedgut (HJ proved the existence of a non-uniform satisfiability threshold, 
i.e., of a sequence r k (n) around which the probability of satisfiability goes from 1 to 0. 

Theorem 3 ( [31] ) For each k > 2, there exists a sequence r k (n) such that for every e > 0, 



In [2~U], Chvatal and Szemeredi established a seminal result in proof complexity, by extending 
the work of Haken and Urquhart jS2] to random formulas. Specifically, they proved that for 
all k > 3, if r > 2 fc ln2 then w.h.p. F k (n, rn) is unsatisfiable but every resolution proof of its 
unsatisfiability contains at least (l + e) n clauses, for some e = e(k,r) > 0. In [2], Achlioptas, Beame 
and Molloy extended the main result of [2U] to random CNF formulas that also contain 2-clauses, 
as this is relevant for the behavior of Davis-Putnam (DPLL) algorithms on random fc-CNF. (DPLL 
algorithms proceed by setting variables sequentially, according to some heuristic, and backtracking 
whenever a contradiction is reached.) By combining the results in the present paper with the 
results in 2] , it was recently shown [3] that a number of DPLL algorithms require exponential time 
significantly below the satisfiability threshold, i.e., for provably satisfiable random A;-CNF formulas. 

Finally, we note that if one chooses to live unencumbered by the burden of mathematical 
proof, powerful non-rigorous techniques of statistical physics, such as the "replica method", become 
available. Indeed, several claims based on the replica method have been subsequently established 
rigorously, so it is frequently (but definitely not always) correct. Using this technique, Monasson 
and Zecchina [12] predicted r k — 2 fc ln2. Like most arguments based on the replica method, their 
argument is mathematically sophisticated but far from rigorous. In particular, they argue that as 
k grows large, the so-called annealed approximation should apply. This creates an analogy with the 
second moment method which we discuss in Section [3.41 

2.2 Random hypergraph 2-colorability 

While Bernstein ^3] originally raised the 2-colorability question for certain classes of infinite set 
families, Erdos popularized the finite version of the problem ^3 [27] H21 EI EH EE] and the 
hypergraph representation. Recall that a 2-uniform hypergraph, i.e., a graph, is 2-colorable if and 
only if it has no odd cycle. In a random graph with cn edges this occurs with constant probability 
if and only if c < 1/2 (see |29j for more on the evolution of cycles in random graphs). 

For all k > 3, on the other hand, hypergraph 2-colorability is NP-complete |45j and determining 
the 2-colorability threshold c k for /^-uniform hypergraphs H k (n,cn) remains open. Analogously to 
random A;-SAT, we will take the liberty of writing c k > c* if H k (n,cn) is 2-colorable w.h.p. for all 
c < c*, and c k < c* if H k (n, cn) is w.h.p. non- 2-colorable for all c > c*. 

Alon and Spencer JJ| were the first to give bounds on the potential value of c k . Specifically, they 
observed that, analogously to random fc-SAT, the expected number of 2-colorings of H k {n, cn) is at 
most [2(1 - 2 1 ~ k ) c ] n and concluded that H k (n, cn) is w.h.p. non-Zc-colorable if c > 2 k 1 In 2. More 



n 



lim Pi[F k (n,rn) is satisfiable] 
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importantly, by employing the Lovasz local lemma, they proved that H k {n, cn) is w.h.p. 2-colorable 
if c = 0(2 k /k 2 ). Regarding the upper bound, it is easy to see that, in fact, 2(1 — 2 1 ~ fc ) c < 1 if 
c = 2 fc-1 ln2 — (ln2)/2 and this yields the upper bound of Theorem^ Moreover, the techniques 
of |43[ I23j can be used to improve this bound further to 2 k ~ 1 In 2 - (In 2)/2 - 1/4 + t k , where t k -> 0. 

The lower bound of was improved by Achlioptas, Kim, Krivelevich and Tetali [6; motivated 
by the analogies drawn in |llj between hypergraph 2-colorability and earlier work |17[ I19j for 
random /c-SAT. Specifically, it was shown in [B] that a simple, linear-time algorithm w.h.p. finds a 
2-coloring of H k (n,cn) for c = 0(2 k /k), implying c k = 0,(2 k /k). These were the best bounds for 
c k prior to Theorem |2] of the present paper. 

Finally, we note that Friedgut's result [HJ applies to hypergraph 2-colorability as well, i.e., 



Theorem 4 ([31!]) For each k > 3, there exists a sequence c k {n) such that for every e > ; 
lim Pr[H k (n,cn) is 2-colorable] = 




(1 - e) c fe (n) 
(1 + e) Cfc(n) 



3 The second moment method: first look 

In the rest of the paper it will be convenient to work with a model of random formulas that 
differs slightly from F k (n,m). Specifically, to generate a random /c-CNF formula on n variables 
with m clauses we simply generate a string of km independent random literals, each such literal 
drawn uniformly from among all 2n possible ones. Note that this is equivalent to selecting, with 
replacement, m clauses from among all possible 2 k n k ordered fc-clauses. This choice of distribution 
for fe-CNF formulas will simplify our calculations significantly. As we will see in Section 14.11 the 
derived results can be easily transferred to all other standard models for random &-CNF formulas. 



3.1 Random /c-SAT 

For any formula F, given truth assignments o~i, a 2, . . . £ {0, l} n we will write o~i, o~2, ■ ■ ■ |= F to 
denote that all of a%, o~2, ■ ■ ■ satisfy F. Let X = X(F) denote the number of satisfying assignments 
of a formula F. Then, for a /c-CNF formula with random clauses c±, C2, ■ ■ ■ , c m we have 



E[X] = E 



E 1 



a\=F 



ri 1 



o\=Ci 



k\m 



(4) 



since clauses are drawn independently and the probability that a satisfies the zth random clause, 
i.e., Efl^u-gJ, is 1 — 2~ k for every a and i. Similarly, for E[X 2 ] we have 



B[X 2 ] = E 



a\=F 



E 



E E 



n ^Thci 



EII e [VhJ • (5) 



We claim that E[1 CT T ^ C .], i.e., the probability that a fixed pair of truth assignments a, r satisfy the 
ith random clause, depends only on the number of variables z to which a and r assign the same 
value. Specifically, if the overlap is z = an, we claim that this probability is 



f s {a) = 1 - 2 l ~ k + 2~ k a k . 



(6) 
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Our claim follows by inclusion-exclusion and observing that if Cj is not satisfied by a, the only 
way for it to also not be satisfied by r is for all k variables in a to lie in the overlap of a and r. 
Thus, fs quantifies the correlation between the events that a and r are satisfying as a function 
of their overlap. In particular, observe that truth assignments with overlap n/2 are uncorrelated 
since is(l/2) = (1 - 2~ fc ) 2 = Pr[<r is satisfying] 2 . 

Since the number of ordered pairs of assignments with overlap z is 2 n (™) we thus have 

E[I 2 ]=2"Xf n )/ s (z/nr • (7) 

Writing z = an and using the approximation (™) = (a a (l — a) 1 ~ a )~ n x poly(n) we see that 

E[X 2 1 >2 n ( max 

At the same time observe that E[X] 2 = (2 n (l - 2~ fc ) rn ) 2 = (4/ s (l/2) r ) n = A 5 (l/2) n . Therefore, 
if there exists some a 6 [0, 1] such that As (a) > As (1/2) then the second moment is exponentially 
greater than the square of the expectation and we only get an exponentially small lower bound for 
Pr[A > 0]. Put differently, unless the dominant contribution to ELY 2 ] comes from "uncorrelated" 
pairs of satisfying assignments, i.e., pairs with overlap n/2, the second moment method fails. 

With these observations in mind, in Fig. 1 we plot As (a) for k = 5 and different values of r. 
We see that, unfortunately, for all values of r shown Ag is maximized at some a > 1/2. If we 
look closely into the two factors comprising As, the reason for the failure of the second moment 
method becomes apparent: while the entropic factor (a a (l — a) 1 ^) is symmetric around 1/2, 
the correlation function fs is strictly increasing in [0, 1]. Therefore, the derivative of As is never 
at 1/2, instead becoming at some a > 1/2 where the benefit of positive correlation balances with 
the cost of decreased entropy. (Indeed, this is true for all k = o(logn) and constant r > 0.) 



fs(a) 



a a (l 



a 



il— a 



x poly(n) 



max As (a 

o<a<i v ' 



x poly(n) 




3.2 Random NAE fc-SAT 

Let us now repeat the above analysis but with X = X(F) being the number of NAE-satisfying 
truth assignments of a formula F. Recall that a is a NAE-satisfying assignment iff under a every 
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clause has at least one satisfied literal and at least one unsatisfied literal. Thus, for a fc-CNF 
formula with random clauses c±, C2, . . . , c m , proceeding as in (@J, we get 

E[X] = 2 n (l - 2 1 " fc ) m , (8) 

since the probability that a NAE-satisfies the ith random clause is 1 — 2 l ~ k for every a and i. 

Regarding the second moment, proceeding exactly as in (JSJ), we write E[X 2 ] as a sum over the 
4™ ordered pairs of assignments of the probability that both assignments are NAE-satisfying. As 
for fc-SAT, for any fixed pair this probability depends only on the overlap. The only change is that 
if a, t agree on z = an variables then the probability they both NAE-satisfy a random clause c, is 

Pr[a and r NAE-satisfy a] = 1 - 2 2 ~ k + 2 1 ~ k (a k + (1 - 

= /at (a) . (9) 

Again, this claim follows from inclusion-exclusion and observing that for a, r to both NAE-violate 
Cj, the variables of Cj must either all be in the overlap of a and r or all be in their non-overlap. 

Applying Stirling's approximation for the factorial again and observing that the sum defining 
E[X 2 ] has only a polynomial number of terms, we now get 

E[X 2 1 < 2 n ( max 

\0<<*<i 

As before, it is easy to see that E[X] 2 = A N (l/2) n . Therefore, if An (1/2) > A N (a) for every a / 
1/2 then IjlOj) implies that the ratio between E[X 2 ] and E[X] 2 is at most polynomial in n. Indeed, 
with a more careful analysis of the interplay between the summation and Stirling's approximation, 
we will later show that whenever An (1/2) is a global maximum, the ratio B[X 2 ]/B[X] 2 is bounded 
by a constant, implying that NAE-satisfiability holds w.u.p.p. So, all in all, again we hope that the 
dominant contribution to E[X 2 ] comes from pairs of assignments with overlap n/2. 

The crucial difference is that now the correlation function /n is symmetric around 1/2 and, 
hence, so is An- As a result, the entropy-correlation product A^v always has a local extremum at 
1/2. Moreover, since the entropic term is always maximized at a = 1/2 and is independent of r, 
for sufficiently small r this extremum is a global maximum. With these considerations in mind, in 
Fig. 2 we plot An (a) for k = 5 and various values of r. 

Let us start with the picture on the left, where r increases from 8 to 12 as we go from top to 
bottom. For r = 8, 9 we see that indeed An has a global maximum at 1/2 and the second moment 
method succeeds. For the cases r = 11, 12, on the other hand, we see that An (1/2) is actually a 
global minimum. In fact, we see that Ajy(l/2) < 1, implying that E[AT] 2 = AAr(l/2) n = o(l) and 
so w.h.p. there are no NAE-satisfying assignments for such r. It is worth noting that for r = 11, 
even though X = w.h.p., the second moment is exponentially large (since A^v > 1 near and 1). 

The most interesting case is r = 10. Here A(l/2) = 1.0023... is a local maximum and greater 
than 1, but the two global maxima occur at a = 0.08... and a = 0.92... where the function equals 
1.0145... As a result, again, the second moment method only gives an exponentially small lower 
bound on Pr[X > 0]. Note that this is in spite of the fact that E[X] is now exponentially large. 
Indeed, the largest value for which the second moment succeeds for k = 5 is r = 9.973... when the 
two side peaks reach the same height as the peak at 1/2 (see the plot on the right in Fig. 2). 

So, the situation can be summarized as follows. By requiring that we only count NAE-satisfying 
truth assignments we make it, roughly, twice as hard to satisfy each clause. This manifests itself in 



f N (aY 



a a (l 



a 



,1— a 



x poly(ra) 



max An (a) 

0<a<l 



x poly(n) 



(10) 



9 




0.2 0.4 0.6 0.8 1 '0 0.2 0.4 0.6 0.8 1 

o a 



k = 5, r = 8, 9, 10, 11, 12 (top to bottom) k = 5, r = 9.973 

Figure 2: The contribution to E[A 2 ] as a function of the overlap for random NAE A;-SAT. 

the additional factor of 2 in the middle term of fpf compared to fs- On the other hand, now, the 
third term of /, capturing "joint" behavior, is symmetric around 1/2, making A itself symmetric 
around 1/2. This enables the second moment method which, indeed, only breaks down when the 
density gets within an additive constant of the upper bound for the NAE fe-SAT threshold. 

3.3 How symmetry reduces variance 

Given a truth assignment a and an arbitrary CNF formula F, let Q = Q(a,F) denote the total 
number of literal occurrences in F satisfied by a. So, for example, Q is maximized by those truth 
assignments that assign every variable its "majority" value. With this definition at hand, a potential 
explanation of how symmetry reduces the variance is suggested by considering the following trivial 
refinement of our generative model: first i) draw km i.i.d. uniformly random literals just as before 
and then ii) partition the drawn literals randomly into fe-clauses (rather than assuming that the 
first k literals form the first clause, the next k the second, etc.). 

In particular, imagine that we have just finished performing the first generative step above 
and we are about to perform the second. Observe that at this point the value of Q has already 
been determined for every a G {0, l} n . Moreover, for each fixed a the conditional probability of 
yielding a satisfying assignment corresponds to a balls-in-bins experiment: distribute Q(cr) balls in 
m bins, each with capacity k, so that every bin receives at least one ball. It is clear that those truth 
assignments for which Q is large at the end of the first step have a big advantage in the second. 

To get an idea of what Q typically looks like on {0, l} n we begin by observing that the number 
of occurrences of a fixed literal £, Bp, is distributed as Bin(A;m, l/(2n)). Thus, E[2?^] = 0(1) and, 
moreover, the random variables Bi are very weakly correlated. As a result, at the end of the first 
step, Q is typically a very smooth function on {0, l} n , attaining a maximum value at the subcube of 
majority vote assignments and gradually decreasing away from them. Thus, at the end of the first 
step the "more promising" truth assignments are correlated: in satisfying many literal occurrences 
(thus increasing their odds for the second step), they tend to overlap with each other (and the 
majority assignment) at more than half the variables. 

In contrast, if we focus on NAE-satisfying assignments, at the end of the first step the most 
promising assignments a are those for which Q(cr) is very close to its average value km/2. So, when 
the problem is symmetric, the typical case becomes the most favorable case and the clustering 
around truth assignments that satisfy many literal occurrences disappears. 

If indeed "populism", i.e., the tendency of each variable to assume its majority value in the 
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formula, is the main source of correlations in random fc-SAT, then the second moment method is a 
good candidate for fc-CNF models which do not encourage this tendency 1 . For example, one such 
model is regular random fc-SAT, in which every literal occurs exactly the same number of times. 
Such formulas can be analyzed using a model analogous to the configuration model of random 
graphs, i.e., by taking precisely d copies of each literal and partitioning the resulting 2dn copies 
into clauses randomly (exactly as in the second step of our two-step model for random fe-SAT). 

3.4 Geometry and connections to statistical physics 

A key quantity in statistical physics is the overlap distribution between configurations of minimum 
energy, known as ground states. When a constraint satisfaction problem is satisfiable, ground 
states correspond to solutions, such as satisfying assignments, 2-colorings, and so on. In the case 
of random fc-SAT, the overlap distribution is the probability P(a) that a random pair of satisfying 
assignments have overlap an. 

This overlap distribution gives an intriguing interpretation of our results. In writing E[X 2 ] as 
a sum of contributions from pairs of assignments with different overlaps, we have in fact calculated 
the average of P{a) over all formulas, weighted by the number of pairs of satisfying assignments of 
each one. Physicists call this weighted average the "annealed approximation" of P(a), and denote 
it Pimn{ot)- It is worth pointing out that, while the annealed approximation clearly overemphasizes 
formulas with more satisfying assignments, Monasson and Zecchina conjectured in [JHI) based on 
the replica method, that it becomes asymptotically tight as k — > oo. 

On a more rigorous footing, it is easy to see that in our case P aim (a) is proportional to A(a) n . 
Therefore, whenever A is peaked at 1/2, P a nn(ct) is tightly peaked around 1/2, since A(a) n vanishes 
for all other values of a as n — > oo. This is precisely what we prove occurs in random NAE fc-SAT for 
densities up to 2 fc_1 ln2 — O(l). In other words, for densities almost all the way to the random NAE 
/c-SAT threshold, in the annealed approximation, the NAE-satisfying assignments are scattered 
throughout the hypercube as if they were independent. 

Note that even if P(a) is concentrated around 1/2 (rather than just -P a nn(aO) this still allows for 
a typical geometry where there are exponentially many, exponentially large clusters, each centered 
at a random assignment. Indeed, this is precisely the picture suggested by some very recent, 
ground-breaking work of Mezard, Parisi, and Zecchina |46| I47j . based on non-rigorous techniques 
of statistical physics. If this is indeed the true picture, establishing it rigorously would require 
considerations much more refined than the second moment of the number of solutions. More 
generally, getting a better understanding of the typical geometry and its potential implications for 
algorithms appears to us a very challenging and very important open problem. 

4 Groundwork 

4.1 Generative Models 

Given a set V of n Boolean variables, let = Cfc(V) denote the set of all proper k-clauses on 
V, i.e., the set of all 2 fc (^) disjunctions of k literals involving distinct variables. Similarly, given 
a set V of n vertices, let = Ek(V) be the set of all [Z) A;-subsets of V. As we saw, a random 

1 We describe recent developments on this point in the Conclusions. 
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/c-CNF formula Fk(n, m) is formed by selecting uniformly a random m-subset of Ck, while a random 
/c-uniform hypergraph Hk(n,m) is formed by selecting uniformly a random m-subset of Ek- 

While Fk(n,m) and Hk(n,m) are perhaps the most natural models for generating random 
fe-CNF formulas and random fc-uniform hypergraphs, respectively, there are a number of slight 
variations of each model. Those are largely motivated by amenability to certain calculations. To 
simplify the discussion we focus on models for random formulas in the rest of this subsection. All 
our comments transfer readily to models for random hypergraphs. 

For example, it is fairly common to consider the clauses as ordered /c-tuples (rather than as k- 
sets) and/or to allow replacement in sampling the set Ck- Clearly, for properties such as satisfiability 
the issue of ordering is irrelevant. Moreover, as long as m = 0(n), essentially the same is true for the 
issue of replacement. To see that, observe that w.h.p. the number of repeated clauses is q = o(n) 
and the subset of m — q distinct clauses is uniformly random. Thus, if a monotone decreasing 
property (such as satisfiability) holds with probability p for a given m = r*n when replacement is 
allowed, it holds with probability p — o(l) for all r < r* when replacement is not allowed. 

The issue of selecting the literals of each clause with replacement (which might result in some 
"improper" clauses) is completely analogous. That is, the probability that a variable appears more 
than once in a given clause is at most k 2 /n = 0(l/n) and hence w.h.p. there are o(n) improper 
clauses. Finally, we note that by standard techniques our results also transfer to the Ff~(n,p) model 
where every clause appears independently of all others with probability p. For that it suffices to 
set p such that 2 k (^jp = r*n — o(n). 

4.2 Strategy and Tools 

Our plan is to consider random /c-CNF formulas formed by generating km i.i.d. random literals, 
where m = rn, and proving that if X = X{F) is the number of NAE-satisfying assignments then: 

Lemma 2 For all e > 0, k > ko(e) and r < 2 k ~ l In 2 — (1 + ln2)/2 — e, there exists some constant 
C = C(k,r) > such that 

B[X 2 } < C x B[X} 2 . 

By Lemma Q] and our discussion in Section l4.1| this implies that Fk(n,rn — o(n)) is NAE- 
satisfiable w.u.p.p. Since a NAE-satisfiable formula is also satisfiable, we have established that 
Fk(n, rn) is satisfiable w.u.p.p. for all r as in Lemma|5J To boost this to a high probability result, 
thus establishing Theorem ^ we employ the following immediate corollary of Theorem 01 

Corollary 1 If Ff-(n,r*n) is satisfiable w.u.p.p. then Fk(n,rn) is satisfiable w.h.p. for all r < r* . 

Friedgut's arguments [HJ apply equally well to NAE A;-SAT, implying that Fk(n,rn) is w.h.p. 
NAE-satisfiable for r as in Lemma |^1 Thus, Lemma |^1 readily yields (jl2j) below, while comes 
from noting that the expected number of NAE-satisfying assignments is [2(1 — 2 l ~ k ) r ] n . (Similarly 
to hypergraphs, the techniques of |4S| I2^j can be used to improve the bound in (jllj) to 2 fe ~ 1 In 2 — 
(ln2)/2 — 1/4 + ifc, where t^ — ► 0.) Indeed, we will see that the proof of Theorem |S] will yield 
Theorem |21 for random hypergraphs with little additional effort. 

Theorem 5 For all k > 3, Fk(n,m = rn) is w.h.p. non-NAE-satisfiable if 

b i , In 2 

r > 2^ 1 ln2 - — . (11) 
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There exists a sequence tk — ► such that for all k > 3, Fk(n,m = rn) is w.h.p. NAE-satisfiable if 

r <2*- 1 In2-^-i±^ (12) 
As we saw in Section f3. 21 the second moment of the number of NAE-satisfying assignments is 

A slightly more complicated sum will occur when we bound the second moment of the number of 
2-colorings. To bound both sums we will use the following lemma which we prove in Sectional 

Lemma 3 [Laplace lemma] Let <f> he a positive, twice- differentiate function on [0, 1] and let 
q > 1 be a fixed integer. Let t = n/q and let 

Letting 0° = 1, define g on [0, 1] as 

cf)(a) 
a a (1 — a) 



l-a 



Lf there exists a max G (0, 1) such that g(a max ) = g max > g(a) for all a / a max , and g"(a max ) < 0, 
then there exists a constant C = C(q, a max , g max , 9"(«max)) > such that 

O n << O n i/max ■ 

5 Bounding the Second Moment for NAE /c-SAT 

Recall that if X is the number of NAE- assignments, then 

E[A] = 2 n (l - 2 1 " fc ) rn 

and 

E[X 2 ]=2"f;( n )/ J v(z/nr , 

where 

k i nl—k I „k i / 1 „\k 



(13) 



/^(a) = 1 - + 2 i " ft ^ + (1 - a) A 

To bound the sum in (|13fl we apply Lemma El with q = 1 and 4>{a) = fN{ot) r ■ Thus, g = g r where 

/ v fN{a) r nA , 

9r{ct) = 7- 7i ■ (14) 

To show that Lemma |21 applies, we will prove in Section that 
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Lemma 4 For every e > 0, there exists ko = &o(e) such that for all k > ko, if 




then g r (a) < g r (l/2) for all a ^ 1/2, and g"(l/2) < 0. 

Therefore, for all r, k, and e as in Lemma HJ there exists a constant C = C(k,r) > such that 

E[X 2 } <Cx 2 n g r (\/2) n • 
Since E[X] 2 = 2 n g r (l/2) n we get that for all r, k, e as in Lemma|I] 

E[X 2 } < C x E[X] 2 . 

6 Bounding the Second Moment for Hypergraph 2-colorability 

Just as for NAE fc-SAT, it will be easier to work with the model in which generating a random 
hypergraph corresponds to generating km random vertices, each such vertex chosen uniformly at 
random with replacement, and letting the first k vertices form the first hyperedge etc. 

In [5] we proved (J2J of Theorem[2]by letting X be the set of all 2-colorings and using a convexity 
argument to show that E[A 2 ] is dominated by the contribution of balanced colorings, i.e., colorings 
with an equal number of black and white vertices. Here we follow a simpler approach suggested by 
David Karger; namely, we define X to be the number of balanced 2-colorings. We emphasize that, 
while technically convenient, the restriction to balanced 2-colorings is not essential for the second 
moment method to succeed on hypergraph 2-colorability, i.e., one has E[A 2 ] = 0(E[A] 2 ) even if 
X is the number of all 2-colorings. 

Of course, in order for balanced colorings to exist n must be even and we will assume that in 
our calculations below. To get Theorem [21 for all sufficiently large n, we observe that if for a given 
c* , Hk(2n,m = 2c* n) is 2-colorable w.h.p. then for all c < c* , Hk(n,cn) is 2-colorable w.h.p. since 
deleting a random vertex of Hk(2n,2c*n) w.h.p. removes o(n) edges. With this in mind, in the 
following we let X be the number of balanced 2-colorings and assume that n is even. 

Since the vertices in each hyperedge are chosen uniformly with replacement, then the probability 
that a random hyperedge is bichromatic in a fixed balanced partition is 1 — 2 1 ~ fe . Since there are 
(n/2) sucn partitions and the m hyperedges are drawn independently, we have 



To calculate the second moment, as we did for [NAE] fc-SAT, we write E[A 2 ] as a sum over all 
pairs of balanced partitions. In order to estimate this sum we first observe that if two balanced 
partitions a and r have exactly z black vertices in common, then they must also have exactly z 
white vertices in common. Thus a and r define four groups of vertices: z that are black in both, 
z that are white in both, n/2 — z that are black in a and white in r, and n/2 — z that are white 
in a and black in r. Clearly, a random hyperedge is monochromatic in both a and r iff all its 
vertices fall into the same group. Since the vertices of each hyperedge are chosen uniformly with 
replacement, this probability is 




(15) 
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Thus, by inclusion-exclusion, the probability that a random hyperedge is bichromatic in both a 
and t is 



1 _ 2 2 ~ k + 2 1 ~ fe 



2z\ k ( 2z\ k 



+ 1-- 

n J \ n J 



f N (2z/n) 



where f N (a) = 1 - 2 2 ~ k + 2 1 ~ k (a k + (1 - is the function we defined for NAE fc-SAT in ©. 
Moreover, observe that the number of pairs of partitions with such overlap is 

n \ / n \ ( n/2 x 2 



K z, z, n/2 — z, n/2 — z J \n/2 / 
Since hyperedges are drawn independently and with replacement, by summing over z we thus get 

n '/ 2 / /o\2 



^^(;; 2 )E( n f)/«(-w 



To bound this sum we apply Lemma|H]with q = 2 and 4>(a) = fN{oi) c . Felicitously, we find ourselves 
maximizing a function g c which, if we replace c with r, is exactly the same function g r we defined 
in (|14[) for NAE /c-SAT. Thus, setting c = r where k,r and e are as in Lemma @J g c is maximized 
at a = 1/2 with g"{l/2) < 0, and Lemma [3 implies that there exists a constant C = C(r, k) > 
such that 

E[X 2 ]<Cn-V2^ ffc(1/2r . 

We now bound E[X] from below using Stirling's approximation (|28|) and get 
ELY 2 ] ^ n-V2y 2 ) 5c(1/2) n. n _ 1/22n ^ 

<Cx ^ = C X —n^ > C X 



y 2 ) 2 (l _ 2 l-fc)2m y 2 ) V2 

To complete the proof, analogously to [NAE] /c-SAT, we use the following "boosting" corollary of 
Theorem |1J 

Corollary 2 If H^n, c*n) is 2-colorable w.u.p.p. then en) is 2-colorable w.h.p. for all c < c* . 

7 Proof of Lemma HI 

We need to prove g'/(l/2) < and g r (ce) < g r (l/2) for all a ^ 1/2. As g r is symmetric around 
1/2, we can restrict to a G (1/2, 1]. We divide (1/2, 1] into two parts and handle them with two 
separate lemmata. The first lemma deals with a G (1/2,0.9] and also establishes that g'/(l/2) < 0. 

Lemma 5 Leta G (1/2,0.9]. For all k > 74, ifr < 2 fc_1 ln2 then g r (a) < g r (l/2) and g% (1/2) < 0. 
The second lemma deals with a G (0.9, 1]. 



Lemma 6 Let a G (0.9,1]. For every e > and all k > k (e), if r < 2 fe ~ 1 ln2 - f - ^- ftt 
g r (a) <g r (1/2). 



en 
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Combining Lemmata El and El we see that for every e > and k > ko = feo(e), if 



r < 2 /c ~ 1 ln2 - — - e 

2 2 

then g r {oi) < g r (l/2) for all a ^ 1/2 and g"(l/2) < 0, establishing LemmaQJ 

We prove Lemmata El and El below. The reader should keep in mind that we have made no 
attempt to optimize the value of ko in Lemma El aiming instead for proof simplicity. For the lower 
bounds presented in Tabled we computed numerically, for each k, the largest value of r for which 
the conclusions of Lemma @]hold. In each case, the condition g"{\/2) < was satisfied with room 
to spare, while establishing g(l/2) > g{a) for all a / 1/2 was greatly simplified by the fact that g 
always has no more than three local extrema in [0, 1] . 

Proof of Lemma El We will first prove that for k > 74, g r is strictly decreasing in a = (1/2,0.9], 
thus establishing g r (a) < g r (l/2). Since g r is positive, to do this it suffices to prove that (lng r )' = 
g'r/dr < in this interval. In fact, since g' r (a) = (In g r )' = at a = 1/2, it will suffice to prove that 
for a G [1/2,0.9] we have (hig r )" < 0. Now, 



In 2 1 



(lns^a))' 



/"(a) f(a) 2 \ 1 



f(a) f(a) 2 J a(l-a) 
/"(a) 1 

/(a) a(l - a) ' { ' 



To show that the r.h.s. of (j!6ft is negative we first note that for a > 1/2 and k > 3, 

f"(a) = 2 l ~ k k{k - \){a k - 2 + (1 - a) k - 2 ) < 2 2 ~ k a k - 2 k 2 

is monotonically increasing. Therefore, f"(a) < /"(0.9) < 2 2 ~ fe 0.9 fc ~ 2 k 2 . 

Moreover, for all a, f(a) > /(1/2) = (1 - 2~ fc ) 2 . Therefore, since l/(a(l — a)) > 4 and 
r < 2 k ~ l In 2, it suffices to observe that for all k > 74, 

^-^)* {l 2 -4 - 4<o - 

Finally, recalling that g'(l/2) = and using 

fir, „ \" - ff"(«) _ ffr(Q) 2 
g r {a) g r [aY 

we see that #"(1/2) < since (In g r )"(l/2) < 0. □ 
Proof of Lemma 1^1 By the definition of g r we see that g r {a) < g r (l/2) if and only if 

f ^ ^ < 2a a {l-a) l ~ a . (17) 



/(1/2) 



Letting h{a) = —a In a — (1 — a) ln(l — a) denote the entropy function, we see that ()17|) holds as 
long as 



1 

< 



In 2 -h(a) ln(l + w) 
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where 



w 



f_M - /(1/2) 
/(1/2) 



Observe now that for k > 3, / is strictly increasing in (1/2, 1], so w > 0. Moreover, for any x > 



1 



1 1 X 
> - + r - ttt • 



ln(l + z) ~ x 2 12 

Since f(a) - /(1/2) = 2 1 ~ k (a k + (1 - a) fc - 2 x - fe ) < 2 x ~ k and /(1/2) = (1 - 2 1 ~ k ) 2 > 1 - 2 2 - fe , we 
thus see that it suffices to have 



■>k-l 



< 



1 2 



i-fc 



ln2-/i(a) a fc + (1 - a) k - 2 l ~ k 2 12 ' 

Now observe that for any < a < 1 and < q < a k , 

1 



(18) 



a fc — q 



>l + k(l- a) + 



Since a > 1/2 we can set q , = 2 1 fc — (1 — a) fc , yieldin, 

1 



> l + A:(l-Q) + 2 1 - fc -(l-a 



a k + (1 - a) fc - 2 1 -* 
Since 2 fc (l — a) fc < 5 _fc , we find that 1)17(1 holds as long as r < (fi(y) — 2~ k where 



(In 2 - Ma))^- 1 + (2 fc " 1 - 2)k{l - a) - 



We are thus left to minimize </> in (0.9, 1]. Since <p is differentiable its minima can only occur at 
0.9 or 1, or where eft' = 0. The derivative of cf> is 



<j,\a) = (2 k ~ 1 - 2) x 
Note now that for all k > 1 



-k (In 2 - h(a)) + (In a - ln(l - a)) 1 + fc(l - a) + 



2 fc — 1 

lim </> (a) = ln(l — a) 

a— >l 2 



2 fc - 4 



is positively infinite. At the same time, 

<£'(0.9) < -0.07 x 2 k k + 1.1 (2 k - 1) + 0.3 

is negative for k > 16. Therefore, (j) is minimized in the interior of (0.9, 1] for all k > 16. Setting <p' 
to zero gives 

k (In 2 - h(a)) 



ln(l - a) 



In a . 



(19) 



1 + £;(1 - a) + 3/(2 fc - 4) 

By "bootstrapping" we derive a tightening series of lower bounds on the solution for the l.h.s 
of (|19|) for a G (0.9, 1). Note first that we have an easy upper bound, 



ln(l — a) < k In 2 — In a . 



(20) 
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At the same time, if k > 2 then 3/(2 — 4) < 1, implying 

k(1n2-h(a)) , N 

-Kl-a)> 2 ' +t(1 _ 1 a j' -'"°- <») 

If we write k(l — a) = B then (|21|) becomes 

, , , In 2 -/i(q) / B \ , 
_ M l_ a)> __Uf_j_ lna . (22) 



By inspection, if 5 > 3 the r.h.s. of (|22|) is greater than the l.h.s. for all a > 0.9, yielding a 
contradiction. Therefore, k(l — a) < 3 for all k > 2. Since In 2 — h(a) > 0.36 for a > 0.9, we see 
that for k > 2, (J2TJ implies 

-ln(l - a) > 0.07k . (23) 

Finally, observe that (|23|) implies that as fe increases the denominator of ()19|) approaches 1. 
To bootstrap, we note that since a > 1/2 we have 

h(a) < -2(1 - a) ln(l - a) (24) 

< 2e-°- 07fc (Hn2-ln0.9) (25) 

< 2ke~ omk 

where (|25[) relies on (|20|) and (|23j) . Moreover, a > 1/2 implies - In a < 2(1 - a) < 2e _0 - 07fc . Thus, 
by using (|2^|) and the fact 1/(1 + x) > 1 — x for all x > 0, ((T§|) gives for > 3, 

-In(l-a) > Hln2-H*)) 



> 



1 + k(l - a) + 3/(2 fc - 4) 
A;(ln2-2fce-°- 07fc ) 



l + 2A;e-°- 07fc 

> fc(ln2-2A:e- - 07fc )(l-2£ : e- ' 07fc ) 

> fcln2-4A: 2 e- a07A: . (26) 

For k > 166, 4k 2 e"°- 07k < 1. Thus, by pjj). we have 1 - a < 3 x 2~ fc . This, in turn, implies 
-lna < 2(1 - a) < 6 x 2~ k and so, by ^ and (HJJ, we have for a > 0.9 

h(a) <6x2' k (k ln2- lna) < 5 k2~ k . (27) 

Plugging 1)27(1 into (|19() to bootstrap again, we get that for A; > 3 

fe(ln2- 5k2~ k ) 



ln(l - a) > 



> 



l + 3/c2~ fc + 3/(2 fc -4) 
A; (In 2 - 5k2~ k ) 
1 + 6/c2- fc 



> fc(ln2 - 5k2- K )(l - 6k2 

> fcln2- Uk 2 2~ k . 

Since e x < 1 + 2x for x < 1 and 11 k 2 2~ k < 1 for k > 10, we see that for such k 

1 - a < 2~ k + 22 k 2 2~ 2k . 
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Plugging into jj2*U|) the fact — lna < 6 x 2 k we get — ln(l — a) < k In 2 + 6x2 k . Using that 
e~ x > 1 — x for x > 0, we get the closely matching upper bound, 

1 - a > 2~ k - 6 x 2" 2fc . 

Thus, we see that for k > 166, </> is minimized at an a m i n which is within 5 of 1 — 2 _fc , where 
5 = 22 k 2 2~ 2k . Let T be the interval [1 - 2~ fc - 5, 1 - 2 _fc + S\. Clearly the minimum of <j> is at least 
0(1 - T~ k ) - 5 x max aeT |(/>'(a)|. It is easy to see from (JTHJ) that if a £ T then |0'(a)| <2k2 k . 

Now, a simple calculation using that ln(l — 2~ fc ) > — 2~ fe — 2~ 2fc for > 1 gives 

-l)ln(l-2- fc )) x (l + (k-l)2- k -k2 2 ~ 2k ) 
k 2 2~ k . 
Therefore, 

h i In 2 1 , 

^min > 2^ X In 2 ----- 45 fc 3 2~ k . 

Finally, recall that ((17)) holds as long £LS r <C 0min — 2 , 1.6., 

r <2 fc - 1 ln2-i^-i-46fc 3 2- fc . 

Clearly, we can take &o = 0(lne _1 ) so that for all k > ko the error term 46 A: 3 2~ k is smaller than 
any e > 0. □ 



-((2 k 
2 VV 



k)\u2 + (2 h 



2 2 



8 Proof of Lemma [31 

The idea behind LemmaEJis that sums of this type are dominated by the contribution of 0(n 1 / 2 ) 
terms around the maximum term. The proof amounts to replacing the sum by an integral and 
using the Laplace method for asymptotic integrals }22| . 

We start by establishing two upper bounds for the terms of S n , one crude and one sharp. For 
the sharp bound we will use the following form of Stirling's approximation, valid for all n > 0: 

\f2wn < -. n \ < V2~7m (1 + l/n) . (28) 
(n/e) n 

Recall that the zth term of S n is 4>{z/t) n , where n = qt and 4>(a) = g{a) a a (l — a) l ~ a . Fix 
any 5 > and suppose that z = at where a S [5, 1 — 5]. Then (|2*H|) yields 

Q%(z/tr< s{a)g{ar{l + ^ q , (29) 

where s(a) = (27ra(l — a)t)~ q ^ 2 . In addition to (|29() . valid for z G [t5,t(l — 5)], we will also use a 
cruder bound, valid for all < z < t. Namely, using the upper bound of (|28|) for tl and the lower 
bound re! > (n/e) n for z\ and (t — z)\ (where we take 0° = 1) we get (since 1 + 1/t < 2) 

Q%(^) n < (STrnf 2 g(a) n ■ (30) 
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Recall now that g(a max ) > g(a) for all a 7^ a max . If L= denotes the interval [a max — e> a m ax + e] 
then for every e > 0, there exists a constant g t < <?(a m ax) = g max such that g(a) < g e for all a ^ J e . 
Let z~ = L(a m ax - e)*J and z+ = |~(a max + e)i] , and let 

" ^/*) n • (31) 



We use (|29|l to bound the terms in Sn and ()30|) to bound the remaining terms of 5 n . Since 
linin^oo (1 + q/n) q = 1, and since lim n ^oo n s g^/g^ ax = for any s, we see that for every e > 

2 + 

5 n < (27ri)- 9/2 x £ . (32) 

Say that a twice-differentiable function ip(x) is unimodal on an interval [a, 6] if ip 1 has a unique 
zero c G [a, 6] with a < c < b, and furthermore ^"(c) < 0. Since gwx > g(a) for all a > a max and 
5"(«max) < 0, we can take e small enough so that g is unimodal on I e . This implies that In g is also 
unimodal on I e and, for n > 1, that g n is unimodal also. Since g n is unimodal, 

^2g(z/t) n <nJ gix^dx+g^ . (33) 

We evaluate this last integral using Lemma i.e., the Laplace method for asymptotic integrals. 
Lemma 7 liffi §4-%l- Let h(x) be unimodal on [a,b] where c is the unique zero of h! in [a,b]. Then 



hm / e nh ^dx = J^Le*) 



n— >oo 



n\h"{c)\ 



Applying Lemma [7] to (|53*)) with /i = In 5 and c = a max , we see that S n < C where 

C = ^r^ 1 '/ 2 X ql/ 2 X Vffmax/ff"(amax). □ 

Lemma El has the following obvious corollary, which is useful for a variety of second moment 
calculations. 

Corollary 3 Let S n and g(a) be defined as in Lemma If there exists a max G (0, 1) with 
j(a mfK ) = <7max and a constant A > such that g(a>) < g m ax — A(a> — a max ) 2 for all a, then 
there exists a constant C = C(q,g mSuX ,a max ,A) > such that 



9 Conclusions 

Before this work, lower bounds on the thresholds of random constraint satisfaction problems were 
largely derived by analyzing very simple heuristics. Here, instead, we derive such bounds by 
applying the second moment method to the number of solutions. In particular, for random NAE 
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fc-SAT and random hypergraph 2-colorability we determine the location of the threshold within 
a small additive constant for all k. As a corollary, we establish that the asymptotic order of the 
random fc-SAT threshold is 0(2 fe ) answering a long-standing open question. 

Since this work first appeared E], our methods have been extended and applied to other 
problems. For random /c-SAT, Achlioptas and Peres [7j confirmed our suspicion (see Section [3.3(1 
that the main source of correlations in random fc-SAT is the "populist" tendency of satisfying 
assignments towards the majority vote assignment. By considering a carefully constructed random 
variable which focuses on balanced solutions, i.e., on satisfying assignments that satisfy roughly 
half of all literal occurrences, they showed > 2 k In 2 — k/2 — 0(1), establishing ~ 2 k In 2. 

In jSj, Achlioptas, Naor and Peres extended the approach of balanced solutions to Max /c-SAT. 
Let us say that a fe-CNF formula is p-satisfiable if there exists a truth assignment which satisfies at 
least (1 — 2~ k + p2~ k ) of all clauses; note that every /c-CNF is O-satisfiable. For p € (0, 1] let rk(p) 
denote the threshold for Fk{n,m = rn) to be p-satisfiable (so that rfc(l) = r^). In jSj, the result 
r k = r fc(l) ~ 2 k In 2 of [7j was extended to all p £ (0, 1] showing 



p + (1 — p) ln(l — p) 

In both [7j and [S], controlling the variance crucially depends on focusing on an appropriate 
subset of solutions (akin to our NAE-assignments, but less heavy-handed). In [2], Achlioptas and 
Naor applied the naive second moment method to the canonical symmetric constraint satisfaction 
problem, i.e., to the number of /c-colorings of a random graph. Bearing out our belief that the 
naive approach should work for symmetric problems they obtained asymptotically tight bounds for 
the /c-colorability threshold. The difficulty there is that the "overlap parameter" is a k x k matrix 
rather than a single real a S [0, 1]. Since k — > oo, this makes the asymptotic analysis dramatically 
harder and much closer to the realm of statistical mechanics calculations. 

We propose several questions for further work. 

1. Does the second moment method give tight lower bounds on the threshold of all constraint 
satisfaction problem with a permutation symmetry? 

2. Does it perform well for problems that are symmetric "on average"? For example, does it 
perform well for regular random /c-SAT where every literal appears an equal number of times? 

3. What rigorous connections can be made between the success of the second moment method 
and the notion of "replica symmetry" in statistical physics? 

4. Is there a polynomial-time algorithm that succeeds with uniformly positive probability close 
to the threshold, or at least for r = uj(k) x 2 k jk where u(k) — ► oo? 
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